112 research outputs found

    A Temporal Sequence Learning for Action Recognition and Prediction

    Full text link
    In this work\footnote {This work was supported in part by the National Science Foundation under grant IIS-1212948.}, we present a method to represent a video with a sequence of words, and learn the temporal sequencing of such words as the key information for predicting and recognizing human actions. We leverage core concepts from the Natural Language Processing (NLP) literature used in sentence classification to solve the problems of action prediction and action recognition. Each frame is converted into a word that is represented as a vector using the Bag of Visual Words (BoW) encoding method. The words are then combined into a sentence to represent the video, as a sentence. The sequence of words in different actions are learned with a simple but effective Temporal Convolutional Neural Network (T-CNN) that captures the temporal sequencing of information in a video sentence. We demonstrate that a key characteristic of the proposed method is its low-latency, i.e. its ability to predict an action accurately with a partial sequence (sentence). Experiments on two datasets, \textit{UCF101} and \textit{HMDB51} show that the method on average reaches 95\% of its accuracy within half the video frames. Results, also demonstrate that our method achieves compatible state-of-the-art performance in action recognition (i.e. at the completion of the sentence) in addition to action prediction.Comment: 10 pages, 8 figures, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV

    Contextual Understanding of Sequential Data Across Multiple Modalities

    Get PDF
    In recent years, progress in computing and networking has made it possible to collect large volumes of data for various different applications in data mining and data analytics using machine learning methods. Data may come from different sources and in different shapes and forms depending on their inherent nature and the acquisition process. In this dissertation, we focus specifically on sequential data, which have been exponentially growing in recent years on platforms such as YouTube, social media, news agency sites, and other platforms. An important characteristic of sequential data is the inherent causal structure with latent patterns that can be discovered and learned from samples of the dataset. With this in mind, we target problems in two different domains of Computer Vision and Natural Language Processing that deal with sequential data and share the common characteristics of such data. The first one is action recognition based on video data, which is a fundamental problem in computer vision. This problem aims to find generalized patterns from videos to recognize or predict human actions. A video contains two important sets of information, i.e. appearance and motion. These information are complementary, and therefore an accurate recognition or prediction of activities or actions in video data depend significantly on our ability to extract them both. However, effective extraction of these information is a non-trivial task due to several challenges, such as viewpoint changes, camera motions, and scale variations, to name a few. It is thus crucial to design effective and generalized representations of video data that learn these variations and/or are invariant to such variations. We propose different models that learn and extract spatio-temporal correlations from video frames by using deep networks that overcome these challenges. The second problem that we study in this dissertation in the context of sequential data analysis is text summarization in multi-document processing. Sentences consist of sequence of words that imply context. The summarization task requires learning and understanding the contextual information from each sentence in order to determine which subset of sentences forms the best representative of a given article. With the progress made by deep learning, better representations of words have been achieved, leading in turn to better contextual representations of sentences. We propose summarization methods that combine mathematical optimization, Determinantal Point Processes (DPPs), and deep learning models that outperform the state of the art in multi-document text summarization

    All-Solution-Processed InGaO 3

    Get PDF
    We fabricated the crystallized InGaZnO thin films by sol-gel process and high-temperature annealing at 900°C. Prior to the deposition of the InGaZnO, ZnO buffer layers were also coated by sol-gel process, which was followed by thermal annealing. After the synthesis and annealing of the InGaZnO, the InGaZnO thin film on the ZnO buffer layer with preferred orientation showed periodic diffraction patterns in the X-ray diffraction, resulting in a superlattice structure. This film consisted of nanosized grains with two phases of InGaO3(ZnO)1 and InGaO3(ZnO)2 in InGaZnO polycrystal. On the other hand, the use of no ZnO buffer layer and randomly oriented ZnO buffer induced the absence of the InGaZnO crystal related patterns. This indicated that the ZnO buffer with high c-axis preferred orientation reduced the critical temperature for the crystallization of the layered InGaZnO. The InGaZnO thin films formed with nanosized grains of two-phase InGaO3(ZnO)m superlattice showed considerably low thermal conductivity (1.14 Wm−1 K−1 at 325 K) due to the phonon scattering from grain boundaries as well as interfaces in the superlattice grain

    Facilitation of corticospinal excitability by virtual reality exercise following anodal transcranial direct current stimulation in healthy volunteers and subacute stroke subjects

    Get PDF
    BACKGROUND: There is growing evidence that the combination of non-invasive brain stimulation and motor skill training is an effective new treatment option in neurorehabilitation. We investigated the beneficial effects of the application of transcranial direct current stimulation (tDCS) combined with virtual reality (VR) motor training. METHODS: In total, 15 healthy, right-handed volunteers and 15 patients with stroke in the subacute stage participated. Four different conditions (A: active wrist exercise, B: VR wrist exercise, C: VR wrist exercise following anodal tDCS (1 mV, 20 min) on the left (healthy volunteer) or affected (stroke patient) primary motor cortex, and D: anodal tDCS without exercise) were provided in random order on separate days. We compared during and post-exercise corticospinal excitability under different conditions in healthy volunteers (A, B, C, D) and stroke patients (B, C, D) by measuring the changes in amplitudes of motor evoked potentials in the extensor carpi radialis muscle, elicited with single-pulse transcranial magnetic stimulation. For statistical analyses, a linear mixed model for a repeated-measures covariance pattern model with unstructured covariance within groups (healthy or stroke groups) was used. RESULTS: The VR wrist exercise (B) facilitated post-exercise corticospinal excitability more than the active wrist exercise (A) or anodal tDCS without exercise (D) in healthy volunteers. Moreover, the post-exercise corticospinal facilitation after tDCS and VR exercise (C) was greater and was sustained for 20 min after exercise versus the other conditions in healthy volunteers (A, B, D) and in subacute stroke patients (B, D). CONCLUSIONS: The combined effect of VR motor training following tDCS was synergistic and short-term corticospinal facilitation was superior to the application of VR training, active motor training, or tDCS without exercise condition. These results support the concept of combining brain stimulation with VR motor training to promote recovery after a stroke. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1743-0003-11-124) contains supplementary material, which is available to authorized users

    Prevention of Cross-update Privacy Leaks on Android

    Get PDF
    Updating applications is an important mechanism to enhance their availability, functionality, and security. However, without careful considerations, application updates can bring other security problems. In this paper, we consider a novel attack that exploits application updates on Android: a cross-update privacy-leak attack called COUPLE. The COUPLE attack allows an application to secretly leak sensitive data through the cross-update interaction between its old and new versions; each version only has permissions and logic for either data collection or transmission to evade detection. We implement a runtime security system, BREAKUP, that prevents cross-update sensitive data transactions by tracking permission-use histories of individual applications. Evaluation results show that BREAKUP’s time overhead is below 5%. We further show the feasibility of the COUPLE attack by analyzing the versions of 2,009 applications (28,682 APKs). © 2018, ComSIS Consortium. All rights reserved.11Ysciescopu
    corecore